Efficiently Updating Cost Repository Values for Query Optimization on Web Data Sources in a Mediator/Wrapper Environment
نویسندگان
چکیده
Optimizing accesses to sources in a mediator/wrapper environment is a critical need. Due to a variety of reasons, relational-based optimization techniques are of no use when having to handle HTTP-based web sources, so new approaches which take into account client/server communication costs must be devised. This paper describes a cost model that stores values from a complete set of web source-focused parameters obtained by the web wrappers, by using a novel updating technique that handles the values measured by the wrappers in previous query executions, and generates a new model instance in each new iteration with an efficient processing cost. This instance allows rapid value updates caused by changes of the server quality or bandwidth, so typical in this context. The results of these techniques are demonstrated both theoretically and by means of an implementation showing how performance improves in real-world web sources when compared to classical approaches.
منابع مشابه
Leveraging Mediator Cost Models with Heterogeneous Data Sources
Distributed systems require declarative access to diverse information sources. One approach to solving this heterogeneous distributed database problem is based on mediator architectures. In these architectures, mediators accept queries from users, process them with respect to wrappers, and return answers. Wrapper provide access to underlying sources. To eeciently process queries, the mediator m...
متن کاملSearching and Querying Wide-Area Distributed Collections
The rapid proliferation of widely-distributed data and document collections raises the need for wrapper/mediator archi-tectures that can handle the challenges of wide area query processing. Traditional query and search techniques do not scale to large numbers of repositories and cannot cope with the unpredictable performance and (un)availability of access to such repositories. Research at the U...
متن کاملValidating Mediator Cost Models with Disco
Disco is a mediator system developed at INRIA for accessing heteroge neous data sources over the Internet In Disco mediators accept queries from users process them with respect to wrappers and return answers Wrapper provide access to underlying sources To e ciently process queries the mediator performs cost based query optimization In a heterogeneous distributed database cost estimate based que...
متن کاملQUERY PROCESSING OVER INCOMPLETE AUTONOMOUS WEB DATABASES by Hemal Khatri
Incompleteness due to missing attribute values (aka “null values”) is very common in autonomous web databases, on which user accesses are usually supported through mediators. Traditional query processing techniques that focus on the strict soundness of answer tuples often ignore tuples with critical missing attributes, even if they wind up being relevant to the user query. Ideally, the mediator...
متن کاملIntegration of Heterogeneous Data Sources with Limited Capabilities in the Object-Oriented Mediator Engine AMOS II
Information becomes a more and more valuable asset in today’s organizations. Therefore the need of creating an integrated view over all available data sources arises. Several technical problems must be overcome in the design and implementation of a system for integrating different data sources. To the main obstacles count autonomy, data heterogeneity and different query capabilities of the repo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006